18 research outputs found
Scheduling Independent Tasks on Multi-cores with GPU Accelerators
Best PaperInternational audienceMore and more computers use hybrid architectures combin-ing multi-core processors and hardware accelerators like GPUs (Graphics Processing Units). We present in this paper a new method for scheduling efficiently parallel applications with CPUs and GPUs, where each task of the application can be processed either on a core (CPU) or on a GPU. The objective is to minimize the makespan. The corresponding scheduling problem is NP-hard, we propose an efficient approximation algorithm which achieves an approximation ratio of . We first detail and analyze the method, based on a dual approximation scheme, that uses a dynamic programming scheme to balance evenly the load between the heterogeneous resources. Finally, we run some simulations based on realistic benchmarks and compare the solution obtained by a relaxed version of this method to the one provided by a classical greedy algorithm and to lower bounds on the value of the optimal makespan
Scheduling independent tasks on multi-cores with GPU accelerators
International audienceMore and more computers use hybrid architectures combining multi-core processors and hardware accelerators like GPUs (Graphics Process-ing Units). We present in this paper a new method for scheduling efficiently parallel applications with m CPUs and k GPUs, where each task of the appli-cation can be processed either on a core (CPU) or on a GPU. The objective is to minimize the maximum completion time (makespan). The corresponding scheduling problem is NP-hard, we propose an efficient approximation algo-rithm which achieves an approximation ratio of 4 3 + 1 3k . We first detail and analyze the method, based on a dual approximation scheme, that uses dynamic programming to balance evenly the load between the heterogeneous resources. Then, we present a faster approximation algorithm for a special case of the previous problem, where all the tasks are accelerated when affected to GPU, with a performance guarantee of 3 2 for any number of GPUs. We run some simulations based on realistic benchmarks and compare the solutions obtained by a relaxed version of the generic method to the one provided by a classical scheduling algorithm (HEFT). Finally, we present an implementation of the 4/3-approximation and its relaxed version on a classical linear algebra kernel into the scheduler of the xKaapi runtime system
A study of scheduling problems with preemptions on multi-core computers with GPU accelerators
International audienceFor many years, scheduling problems have been concerned either with parallel processor systems or with dedicated processors-job shop type systems. With a development of new computing architectures this partition is no longer so obvious. Multi-core (processor) computers equipped with GPU co-processors require new scheduling strategies. This paper is devoted to a characterization of this new type of scheduling problems. After a thorough introduction of the new model of a computing system, an extension of the classical notation of scheduling problems is proposed. A special attention is paid to preemptions, since this feature of the new architecture differs the most as compared with the classical model. In the paper, several scheduling algorithms, new ones and those refining classical approaches, are presented. Possible extensions of the model are also discussed
Scheduling Independent Moldable Tasks on Multi-Cores with GPUs
The number of parallel systems using accelerators is growing up.The technology is now mature enough to allow sustainedpetaflop/s. However, reaching this performance scale requiresefficient scheduling algorithms to manage the heterogeneouscomputing resources.We present a new approach for scheduling independent tasks onmultiple CPUs and multiple GPUs. The tasks are assumed to beparallelizable on CPUs using the moldable model: the final numberof cores allotted to a task can be decided and set by thescheduler. More precisely, we design an algorithm aiming atminimizing the makespan---the maximum completion time of alltasks---for this scheduling problem. The proposed algorithmcombines a dual approximation scheme with a fast integer linearprogram (ILP). It determines both the partitioning of the tasks,ie whether a task should be mapped to CPUs or a GPU, and thenumber of CPUs allotted to a moldable task if mapped to the CPUs.A worst case analysis shows that the algorithm has anapproximation ratio of . However, sincethe complexity of the ILP-based algorithm could benon-polynomial, we also present a proved polynomial-timealgorithm with an approximation ratio of .We complement the theoretical analysis of our two novelalgorithms with an experimental study. In these experiments, wecompare our algorithms to a modified version of the classical\heft algorithm, adapted to handle moldable tasks. Theexperimental results show that our algorithm with the approximation ratio producessignificantly shorter schedules than the modified \heft for mostof the instances. In addition, the experiments provide evidencethat this ILP-based algorithm is also practically able to solvelarger problem instances in a reasonable amount of time
Ordonnancement pour les nouvelles plateformes de calcul avec GPUs
More and more computers use hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like GPUs (Graphics Processing Units). These hybrid parallel platforms require new scheduling strategies. This work is devoted to a characterization of this new type of scheduling problems. The most studied objective in this work is the minimization of the makespan, which is a crucial problem for reaching the potential of new platforms in High Performance Computing. The core problem studied in this work is scheduling efficiently n independent sequential tasks with m CPUs and k GPUs, where each task of the application can be processed either on a CPU or on a GPU, with minimum makespan. This problem is NP-hard, therefore we propose approximation algorithms with performance ratios ranging from 2 to (2q+1)/(2q)+1/(2qk), q>0, and corresponding polynomial time complexities. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes. Some variants of the core problem are studied: a special case where all the tasks are accelerated when assigned to a GPU, with a 3/2-approximation algorithm, a case where preemptions are allowed on CPUs, the same problem with malleable tasks, with an algorithm with a ratio of 3/2. Finally, we studied the problem with dependent tasks, providing a 6-approximation algorithm. Experiments based on realistic benchmarks have been conducted. Some algorithms have been integrated into the scheduler of the xKaapi runtime system for linear algebra kernels, and compared to the state-of-the-art algorithm HEFT.De plus en plus d'ordinateurs utilisent des architectures hybrides combinant des processeurs multi-cœurs (CPUs) et des accélérateurs matériels comme les GPUs (Graphics Processing Units). Ces plates-formes parallèles hybrides exigent de nouvelles stratégies d'ordonnancement adaptées. Cette thèse est consacrée à une caractérisation de ce nouveau type de problèmes d'ordonnancement. L'objectif le plus étudié dans ce travail est la minimisation du makespan, qui est un problème crucial pour atteindre le potentiel des nouvelles plates-formes en Calcul Haute Performance.Le problème central étudié dans ce travail est le problème d'ordonnancement efficace de n tâches séquentielles indépendantes sur une plateforme de m CPUs et k GPUs, où chaque tâche peut être exécutée soit sur un CPU ou sur un GPU, avec un makespan minimal. Ce problème est NP-difficiles, nous proposons donc des algorithmes d'approximation avec des garanties de performance allant de 2 à (2q + 1)/(2q) +1/(2qk), q> 0, et des complexités polynomiales. Il s'agit des premiers algorithmes génériques pour la planification sur des machines hybrides avec une garantie de performance et une fin pratique. Des variantes du problème central ont été étudiées : un cas particulier où toutes les tâches sont accélérées quand elles sont affectées à un GPU, avec un algorithme avec un ratio de 3/2, un cas où les préemptions sont autorisées sur CPU, mais pas sur GPU, le modèle des tâches malléables, avec un algorithme avec un ratio de 3/2. Enfin, le problème avec des tâches dépendantes a été étudié, avec un algorithme avec un ratio de 6. Certains des algorithmes ont été intégré dans l'ordonnanceur du système xKaapi
Scheduling for new computing platforms with GPUs
De plus en plus d'ordinateurs utilisent des architectures hybrides combinant des processeurs multi-cœurs (CPUs) et des accélérateurs matériels comme les GPUs (Graphics Processing Units). Ces plates-formes parallèles hybrides exigent de nouvelles stratégies d'ordonnancement adaptées. Cette thèse est consacrée à une caractérisation de ce nouveau type de problèmes d'ordonnancement. L'objectif le plus étudié dans ce travail est la minimisation du makespan, qui est un problème crucial pour atteindre le potentiel des nouvelles plates-formes en Calcul Haute Performance.Le problème central étudié dans ce travail est le problème d'ordonnancement efficace de n tâches séquentielles indépendantes sur une plateforme de m CPUs et k GPUs, où chaque tâche peut être exécutée soit sur un CPU ou sur un GPU, avec un makespan minimal. Ce problème est NP-difficiles, nous proposons donc des algorithmes d'approximation avec des garanties de performance allant de 2 à (2q + 1)/(2q) +1/(2qk), q> 0, et des complexités polynomiales. Il s'agit des premiers algorithmes génériques pour la planification sur des machines hybrides avec une garantie de performance et une fin pratique. Des variantes du problème central ont été étudiées : un cas particulier où toutes les tâches sont accélérées quand elles sont affectées à un GPU, avec un algorithme avec un ratio de 3/2, un cas où les préemptions sont autorisées sur CPU, mais pas sur GPU, le modèle des tâches malléables, avec un algorithme avec un ratio de 3/2. Enfin, le problème avec des tâches dépendantes a été étudié, avec un algorithme avec un ratio de 6. Certains des algorithmes ont été intégré dans l'ordonnanceur du système xKaapi.More and more computers use hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like GPUs (Graphics Processing Units). These hybrid parallel platforms require new scheduling strategies. This work is devoted to a characterization of this new type of scheduling problems. The most studied objective in this work is the minimization of the makespan, which is a crucial problem for reaching the potential of new platforms in High Performance Computing. The core problem studied in this work is scheduling efficiently n independent sequential tasks with m CPUs and k GPUs, where each task of the application can be processed either on a CPU or on a GPU, with minimum makespan. This problem is NP-hard, therefore we propose approximation algorithms with performance ratios ranging from 2 to (2q+1)/(2q)+1/(2qk), q>0, and corresponding polynomial time complexities. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes. Some variants of the core problem are studied: a special case where all the tasks are accelerated when assigned to a GPU, with a 3/2-approximation algorithm, a case where preemptions are allowed on CPUs, the same problem with malleable tasks, with an algorithm with a ratio of 3/2. Finally, we studied the problem with dependent tasks, providing a 6-approximation algorithm. Experiments based on realistic benchmarks have been conducted. Some algorithms have been integrated into the scheduler of the xKaapi runtime system for linear algebra kernels, and compared to the state-of-the-art algorithm HEFT
Scheduling Tasks with Precedence Constraints on Hybrid Multi-core Machines
International audienceIn this work, we are interested in scheduling dependent tasks for hybrid parallel multi-core machines, composed of CPUs with additional accelerators (GPUs). The objective is to minimize the make span, which is a crucial problem for reaching the potential of new platforms in High Performance Computing. We provide an approximation algorithm with a performance guarantee of 6 to solve this problem. The algorithm is a two-phase solving method: a first phase based on rounding the solution provided by solving a linear programming formulation for the assignment of the tasks to the resources. A second phase uses a classical list algorithm to schedule the tasks according to the assignment phase. The proposed approach is the first generic algorithm with a performance guarantee for scheduling tasks with precedence constraints on hybrid platforms with CPUs and GPUs resources
Approximation Algorithms for a Scheduling Problem on Multi-Cores with GPUs
International audienceno abstrac
A Family of Scheduling Algorithms for Hybrid Parallel Platforms
International audienc